Natural language processing to classify electrocardiograms in patients with syncope: A preliminary study

We report the preliminary findings of a simple NLP algorithm that can be applied to an ECG machine automated report to allow near ‐ perfect classification of abnormal syncope ECGs. This NLP algorithm may be a valuable tool to help accurately interpret ECGs in patients with syncope and improve their risk stratification.


| INTRODUCTION
Syncope is a dramatic symptom, accounting for about 1.2% of all ED visits. 1,2 An electrocardiogram (ECG) is recommended in these patients. 3,4 Clinical risk stratification of patients with syncope includes interpretation of ECG findings. 1,[5][6][7][8][9] In risk stratification studies an "abnormal" ECG is consistently the most important risk factor, especially when considering the risk of cardiac arrythmia and/or sudden death. The characteristics of an "abnormal" ECG are similar in these studies. Where differences exist, they are around numerical cut points and criteria requiring subjective interpretation. Furthermore, few studies have considered the knowledge, interpretation, and time needed to apply criteria at the bedside. 7 A consensus conference of experts determined 11 important characteristics of an "abnormal" ECG. 10,11 All these characteristics are automatically generated and reported by ECG machines.
In this study, we use natural language processing (NLP) to extract the 11 syncope-specific criteria from a machine-generated ECG report and use them to classify abnormal ECGs for syncope.
We compare this classification to unstructured physician interpretation as well as the general summary classification of the machine report.

| METHODS
Commonly used decision tools and expert guidelines were reviewed to come up with criteria for an "abnormal" syncope ECG. Table 1 summarizes the studies and guidelines. From this, we determined that any of the following 11 criteria as indicative of an "abnormal" ECG in a patient with syncope; QTc > 470, LBBB, QRS > 100, Q waves, ST segment changes, PR < 120 ms, any AV (Type I, II, III), left axis deviation, non-sinus rhythm (including paced) multiple PVC's, sinus bradycardia <40.
An NLP algorithm was written in Python to extract the criteria from a typical ECG report ( Figure 1). The report was extracted in XML format from the standard 12 lead ECG. The report was interpreted by the NLP algorithm and classified as abnormal if any "abnormal" characteristic was present.
To test and refine the algorithm it was applied to the first ECG from a random sample of 100 ED visits for syncope. These 100 ECGs also underwent precise manual application of the 1100 criteria. The manual application was considered the gold standard for accuracy and classification. The NLP algorithm was applied after refinement and used to classify ECGs as abnormal based on these criteria. The performance of the final refined NLP algorithm used to classify the ECG was assessed against the machinegenerated summary report and experienced physician interpretation. In the case of the machine-generated summary report, it was classified as "normal/borderline" versus "other/abnormal." The ECGs were also evaluated by two experienced board-certified emergency medicine physicians. The physicians were aware that the ECG came from patients presenting to the emergency department with syncope but given no specific criteria to apply.
They were asked to classify the ECGs as "normal" versus "abnormal" in the cases that they felt the ECG had a finding concerning for syncope. Accuracy, sensitivity, and specificity were calculated with 95% confidence intervals. All data were deidentified by the institutions health information system before being provided to the researchers for analysis. The protocol was approved by researchers a t and the protocol was approved by the Stanford University Institutional Review Board with an exemption from informed consent.

| RESULTS
The initial application and assessment of the NLP and script involved 1100 criteria from 100 ECGs of which 62% had at least one "abnormal" criterion. The initial algorithm application correctly In this study, we showed that NLP of a standard ECG report can accurately identify 11 predetermined ECG criteria for syncope. After identifying the criteria, they were utilized to correctly classify abnormal ECGs. The process does not require interpretation of the ECG waveforms and is faster and more accurate than experienced emergency physicians.
The process may improve the bedside assessment of ECG criteria for syncope and the performance of clinical decision tools.
A standard 12 lead ECG comes with the waveforms and a machinegenerated report based on a computer algorithm used to interpret the raw ECG data. The use of computers to interpret ECG goes back to 1961 with automated reports becoming standard since the late 1970s. [12][13][14] Over the years, algorithm techniques and more data from more leads have improved the accuracy of the reports. 15 Current machine-generated reports are very accurate with small variations between different manufacturers. These differences usually involve interval measurements and interpretation. 16 Traditionally machine-generated reports are usually overread by physicians for accuracy and there are guidelines for their use.
More recently the machine-generated reports have been found useful for triage decisions in Emergency Departments. 17,18 In this study, we performed NLP techniques on the automated report and used manual interpretation of the variables as the gold standard. The initial scripting had 10 incorrect interpretations. Four of these were interval measurement discrepancies. Specifically, the automated report did not note first-degree AVB as detailed on the ECG with a PR interval greater than 200 ms. These were easily rescripted using the numerical output and not the written report for this criterion. Other misses were due to spelling the entire word "premature ventricular contractions" versus PVC and one miss had both atrial fibrillation and sinus rhythm in the report. The features missed although recurrent were rare and easily rescripted to improve accuracy.
The preliminary development of the NLP algorithm involved the assessment of 1100 manually checked criteria but was limited to 100 ECGs and only one type of machine-generated report.
Further testing and interpretation of other machine-generated reports are warranted. However, with the standardization of machines and reports, we would expect little or no differences in our findings. For the purposes of NLP scripting, any automated report that could be converted to XML format could be interpreted and classified. If these automated report and format could be imported into a bedside clinical application widespread validation and implementation could be undertaken. Writingreview & editing.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.

ETHICS STATEMENT
The Stanford University Institutional Review Board with an exemption from informed consent under the policies of the US Federal Policy for the Protection of Human Subjects Research.

TRANSPARENCY STATEMENT
The lead author James Quinn affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned (and, if relevant, registered)